AI Forensics

Final symposium, “AI and the Society of the Future”, VolkswagenStiftung

Patrick Riechert

HfG Karlsruhe

Giulia Gandolfi

HfG Karlsruhe

23. April, 2025

AI Forensics: Accountability through interpretability in visual AI systems.

Staatliche Hochschule für Gestaltung Karlsruhe

Künstliche Intelligenz und Medienphilosophie
Prof. Matteo Pasquinelli

Universität Kassel

Dept. of Participatory IT Design
Prof. Claude Draude

Durham University

Dept. of Computer Science
Prof. Noura Al Moubayed

Cambridge University

Cambridge Digital Humanities
Dr. Leonardo Impett

University of California, Santa Barbara

Center for the Humanities and Machine Learning
Dr. Fabian Offert

AI Forensics: Accountability through interpretability in visual AI systems.

Figure 1

Outline :

  • the original plan: motivation and structure
  • confluence of complications and a crisis of concept
  • research trajectories, strategies, results
  • re/discovering the “soul of the project”
  • outputs ahead

Original formulation

Project aim

enable public accountability through interpretability

technical investigations rarely lead to practical, interactive instruments that favour accountability and, conversely, critical studies of accountability are still lacking tools for a technical analysis of the AI black box.

So implies also to bridge gaps between

  • academic disciplines (critical AI studies; explainable AI/ML)
  • societal sectors (research, activism, art)

Accountability through interpretability

  • developing—and demonstrating—a “generalized sociotechnical methodology”, commensurate to the effects AI brings on a societal level
  • dogfooding the creation of software—a platform of tools enabling an integrated, “full-stack” approach, also for e.g. researchers new to AI analysis
Figure 2: AI Forensics as a synthesis of methodologies

Project dimensions

Integrated methodologies/approaches:

  • Sociohistorical
  • Technical
  • Participatory design

Disassembling and deobfuscating the ‘AI production pipeline’

  • Dataset
  • Model
  • Application

Two pillars/output orientations

  • Sociotechnical case studies
  • Forensics toolkit

Project structure

Sociotechnical case studies

  1. Exposing the production pipeline of visual AI systems
  2. AI Interpretability and accountability in the humanitarian sector
  3. AI design interventions for social diversity
  4. Interpretability and accountability of visual AI systems in the sciences

Toolkit components

  • Data provenance tool
  • Model analysis tool
  • Adversarial attack module (e.g. adversarial patches)

Project structure

Toolkit components

Mockup: integrated interface surfacing also e.g. contextual sociohistorical/epistemic aspects
  • Data provenance tool
  • Model analysis tool
  • Adversarial attack module (e.g. adversarial patches)

Work plan

gantt
    dateFormat YY-MM
    axisFormat %Y W%V
    todayMarker off

    section Project coordination
   %% Application submission  :milestone, submitted, 2021-11-24, 1d
    Project Start :milestone, startv, 22-05, 1d
    Hiring Soc. Phds : hiresocs, after startv , 60d
    Hiring Code Phds : hirecode, after startv , 121d

    section Methodology design
    Inaugural workshop :crit, ws1 , after hiresocs , 1d
    Sociotech. case study research design :scsrd, after ws1 , 183d
    Second workshop :crit, ws2 , after hirecode , 61d

    section Implement method—Data 
    Dataset toolkit : datatk , after hirecode , 244d
    Sociotech. case study data coll. :scsdg, after scsrd , 61d
    Sociotech. case study data & model forensics :crit , scsdatamodforensics , after scsdg , 61d
    
    section Implement/co-develop Model analysis
    Model forensics dev. : modeltk , after scsdg , 427d
    Sociotech. case study model forensics, : scsmodfr , after scsdg , 122d
    Third workshop :crit, ws3 , after scsmodfr , 61d

    section Adversarial dev.
    Application forensics dev. :apptk , after scsmodfr , 397d
    Sociotech. case study model/appl. forensics : scsmodapplforensics , after ws3 , 61d
    Fourth workshop :crit, ws4 , after scsmodapplforensics , 61d

    section Sociotechnical Case-studies scholarly side
    Sociotech. case studies read-write : scsrw , after hiresocs , until integr

    section Integration
    Integration of toolkit and case studies :integr , after scsmodapplforensics , 365d

Figure 3: Original work plan implied intricate causal dependencies

Complications

The factors

Internal:

Organisational ‘entropy’

  • Shifting institutional constellation shifts throughout
  • Hiring delays plus systemic rigidities
  • Temporal desynchronisation along three dimensions: 27mo. between projects; 4*academic calendars, 9h- time diff.

External:

Project begins 2022

  • pre-/post-“foundation model” worlds
  • discursive & technical moment:
    • instruct/chat models; oss models; transformers, language models
    • societal proliferation—incl. education, academia

gantt
    title Early timeline: AI Forensics and AI
    dateFormat  YYYY-MM-DD

    section Project-meta-news
    Draftproposal  :2021-06-15 , 1d
    Submission  :milestone, submitted, 2021-11-24, 0d
    Start :milestone, start2, 2022-11-15, 0d

    section AI Developments
    Instruct-tuning   :milestone, instruct, 2022-03-04, 0d
    Stable diffusion  :milestone, sd, 2022-09-01, 0d
    ChatGPT released  :milestone, chat, 2022-12-05, 0d
    LLama leaks       :milestone, lleak, 2023-03, 0d
    %% Timnit Gebru saga :

    section Case studies
    CS1a AI Regimes neural wiring visuality   :2023-03-01 , 2024-05-05
    CS2 Humanitarian AI    :2022-05-01 , 2024-05-05
    CS3 Design Social Diversity    :2022-11-01 , 2024-05-05
    CS4 AI in Science    :2023-01-31 , 2024-05-05

    section Forensics toolkit
    FTK-Data :2022-05-01 , 2023-05-01
    FTK-Data-postterm :done, 2023-05-01 , 2023-07-31
    FTK-Model :2023-03-01 , 2024-05-05
    %% FTK-App :2024-03-01 , 2024-05-05

    section MeetingsINT
    WS-sched-try1  :done, 2023-01-01  , 30d
    WS-sched-try2  :done, 2023-02-01  , 30d
    IRLmeeting     :2024-05-05 , 0d

Transformers’ gauntlet

The foundation model/transformer/LLM era redrew the map of AI on a societal level—and thereby also directly bore on the normative impetus of the project as such—

  • Architectural homogenisation: transformer/attention paradigm
  • Generative AI’s sudden affinity with language and visual culture
  • Extreme polarisations:
    • Discursively: ‘Research’ AI discourse; control/regulatory debates
    • Access, in the proprietary/API-gated phase
    • Scale

Toolkit proposal vs. transformer paradigm

Across every dimension: scale and access

Putative toolkit components
Dataset
  • ≈ Internet-scale
  • Platformisation, data-extractivism
Model
  • Inner structures understood at only basic levels
  • API-gating
  • Size, parameter count
Application
  • Vast proliferation–dimensions of
    • Automation & political economy
    • Economics of veridiction/epistemic poisoning
    • Education & adversariality |

Morale

  • Haphazard momentum at beginning (staffing, bureaucratically, planning-overhead)—

  • …coincided with discovery that a decent part of the common project was being rendered obsolete, assumptions had to be rethought…

  • and fed into a minor crisis of project coherence, including renaming discussions

…in retrospect: actually a productive time

What does interpreting AI mean?

Subprojects: AI Interpretability and Accountability in the Humanitarian Sector – Arif Kornweitz (PhD Candidate)

A Pedagogy of Machines: Technology in Education and Universities in Translation – Paolo Caffoni (PhD Candidate)

Matteo Pasquinelli (Supervisor) – KIM Research Group, HfG Karlsruhe

  • Technical idea of “mechanistic interpretability” is at stake—quite different from the notion of interpretation in humanities, social sciences, or cultural studies
  • A detour/return to study of art and culture e.g. (Pasquinelli and Kornweitz 2023; Kornweitz 2023) as well as (Impett 2023) and (Offert 2023)
  • Integration with the KIM colloquia and seminars of summer 2023 turns into a strong exploration/discussion asset
  • Shift to semiotics re: AI’s “linguistic turn”—nuancing common views on AI as ‘neo-structuralism’

Explainability beyond explainability

Subproject: AI Design Interventions for Social Diversity – Goda Klumbytė (PhD Candidate) and Claude Draude (Supervisor) – Participatory IT Design group, Universität Kassel

  • How can explainability/interpretability/understandability be addressed through design- and interaction-related concepts?
  • Feminist epistemologies highlighting pluriversal perspectives and the situatedness of knowledge (Klumbyte, Piehl, and Draude 2023b, 2023a)
  • Mapping the ‘design diagrams’—e.g. the organisation of the production processes creating AI models
  • Response-ability
  • Understand explainability and understandability not merely as properties of an AI system, but as emerging in interaction—pointing to roles of
    • interaction modalities (e.g. embodiment),
    • metaphors (e.g. focus on transparency as a pre-requisite)
    • sociocultural factors (e.g. uneven distribution of power and resources)

Explainability beyond explainability

Embodiment and tangibility

🖇 Leonardo Angelini et al., ‘Tangible LLMs: Tangible Sense-Making For Trustworthy Large Language Models’, in Proceedings of the Nineteenth International Conference on Tangible, Embedded, and Embodied Interaction, TEI ’25 (ACM, 2025) https://doi.org/10.1145/3689050.3708338.

  • Tangible interaction modalities for core LLM concepts such as input/output embedding, positional encoding and latent space, attention and temperature.
  • Ex. prototype: certainty/temperature via pressure: “squeeze the user’s hand when LLM is uncertain” {…}. user response (pressure/grip) would then control the temperature parameter.

Materials produced during participatory design workshop on embodied explanations
Researcher/s Project Object Type
Offert/Cai/Kim scs4 Meta ESM2 Model
Kornweitz scs2 Humanitarian EU Copernicus; Mueller/Rauh conflict prediction Pipeline/model
Caffoni Pedagogies U MLT Meta NLLB-200 Model
Dare Generative AI Untoolkit MS Celeb; NIST Mugshots Dataset
Impett Scopic regimes of neural wiring Neocognitron; convolution Model; (algorithmic) technique
Klumbytė/Draude AI Design interventions for social diversity Design processes and interaction paradigms Processes
Offert/Impett OpenAI, CLIP Model
Offert Offert OpenAI, CLIP/Dall-E 2 Model
Leask Latent mechanisticinterpretability Sparse Autoencoder; BatchTopK SAEs Algorithmic technique; Implementation
Harvey exposing.ai 32 Datasets underlying facial recognition models Datasets

Models of?

Subprojects: AI

📄 Fabian Offert, ‘On the Concept of History (in Foundation Models)’, IMAGE 37 (22 May 2023): 121–34, https://doi.org/10.1453/1614-0885-1-2023-15462.

📃 Fabian Offert, Paul Kim, and Qiaoyu Cai, ‘Synthesizing Proteins on the Graphics Card. Protein Folding and the Limits of Critical AI Studies’ (arXiv, 7 December 2024) https://doi.org/10.48550/arXiv.2405.09788.

📘 Fabian Offert, and Leonardo Impett, Vector knowledge (forthcoming).

  • protein folding, the genetic code, and the uncritical acceptance of biosemiotic language paradigm in biology from the 20th century on…

language as the ‘the special case’ for transformers not really models of language (Offert, Kim, and Cai 2024)

Pedagogies ofwithafter AI?

  • Cambridge – Dare – Generative AI UnToolkit
  • Karlsruhe – Caffoni – A Pedagogy of Machines
  • Kassel — Klumbytė/Draude – useXAI: The Use and Explainability Needs of Generative AI Tools among Students (led by Participatory IT Design group in Kassel, collaboration with Faculty of Social Sciences and Human Sciences) –

Medical AI

G. G.

Mechanistic interpretability

Subproject “Latent mechanistic interpretability,” Patrick Leask (PhD student) and Noura Al Moubayed (advisor) at Durham University

  • Sparse AutoEncoders for monosemantic features
  • BatchTopK

Figure 4: Meta SAE Dashboard for feature exploration

Rediscovering the ‘soul of the project’

or: “the importance of meeting in-person”

the friction between the social and the technical

sensitivity to how ways of thinking about social forms hide their way into technical forms and have a sort of Nachleben

  • affinity/proximity to the notion of boundary object(Star 1989, 2010; Star and Ruhleder 1996)

  • understanding of these systems as epistemic machines

  • Glossary: Interpreting machine learning: a sociotechnical glossary

Output/s

Collective output

📘 Matteo Pasquinelli, Claude Draude, Noura Al Moubayed, Leonardo Impett, and Fabian Offert (Eds.), Interpreting machine learning: a sociotechnical glossary (forthcoming/2025).

  • boundary concepts
  • latent space

various projects

Output/s

After the ‘Full stack interface’

  • SAE dashboard/s
  • Critical-pedagogical interfaces
  • Embodiment
  • Latent space of glossary

References

Impett, Leonardo. 2023. “A History of Machine Visuality,” April. https://fromhypetoreality.com/.
Klumbyte, Goda, Hannah Piehl, and Claude Draude. 2023a. “Feminist Epistemology for Machine Learning Systems Design.” https://doi.org/10.48550/ARXIV.2310.13721.
———. 2023b. “Towards Feminist Intersectional XAI: From Explainability to Response-Ability.” https://doi.org/10.48550/ARXIV.2305.03375.
Kornweitz, Arif. 2023. “AI en de regels van de kunst.” Metropolis M 2023 (November). https://metropolism.com/nl/feature/50950_ai_en_de_regels_van_de_kunst/.
Offert, Fabian. 2023. “What Are Large Visual Models Models Of?” April. https://fromhypetoreality.com/.
Offert, Fabian, Paul Kim, and Qiaoyu Cai. 2024. “Synthesizing Proteins on the Graphics Card. Protein Folding and the Limits of Critical AI Studies.” https://doi.org/10.48550/ARXIV.2405.09788.
Pasquinelli, Matteo, and Arif Kornweitz. 2023. “The Sound of Multidimensional Space: How Avant-Garde Music Foreshadowed Artificial Intelligence.” In, 8693. Catalogue of the International Festival of Contemporary Music 67. Venice: La Biennale di Venezia; NERO.
Star, Susan Leigh. 1989. “Chapter 2 - the Structure of Ill-Structured Solutions: Boundary Objects and Heterogeneous Distributed Problem Solving.” In, edited by Les Gasser and Michael N. Huhns, 37–54. San Francisco (CA): Morgan Kaufmann. https://www.sciencedirect.com/science/article/pii/B978155860092850006X.
———. 2010. “This Is Not a Boundary Object: Reflections on the Origin of a Concept.” Science, Technology, & Human Values 35 (5): 601–17. https://doi.org/10.1177/0162243910377624.
Star, Susan Leigh, and Karen Ruhleder. 1996. “Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces.” Information Systems Research 7 (1): 111–34. https://doi.org/10.1287/isre.7.1.111.